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Scientists have grappled with reconciling biological evolution’? with the immutable 
laws of the Universe defined by physics. These laws underpin life’s origin, evolution 
and the development of human culture and technology, yet they do not predict the 
emergence of these phenomena. Evolutionary theory explains why some things exist 
and others do not through the lens of selection. To comprehend how diverse, open- 
ended forms can emerge from physics without an inherent design blueprint, anew 
approach to understanding and quantifying selection is necessary® °. We present 
assembly theory (AT) as a framework that does not alter the laws of physics, but 
redefines the concept of an ‘object’ on which these laws act. AT conceptualizes objects 
not as point particles, but as entities defined by their possible formation histories. 
This allows objects to show evidence of selection, within well-defined boundaries of 
individuals or selected units. We introduce a measure called assembly (A), capturing 
the degree of causation required to produce a given ensemble of objects. This approach 
enables us to incorporate novelty generation and selection into the physics of complex 
objects. It explains how these objects can be characterized through a forward dynamical 
process considering their assembly. By reimagining the concept of matter within 
assembly spaces, AT provides a powerful interface between physics and biology. It 
discloses a new aspect of physics emerging at the chemical scale, whereby history and 
causal contingency influence what exists. 


In evolutionary theory, natural selection’ describes why some things 
exist and others do not”. Darwin’s theory of evolution and its modern 
synthesis point out how selection among variants in the past generates 
current functionality’, as well as a forward-looking process‘. Neither 
addresses the space in which new phenotypic variants are generated. 
Physics can, in theory, take us from past initial conditions to current 
and future states. However, because physics has no functional view 
of the Universe, it cannot distinguish novel functional features from 
random fluctuations, which means that talking about true novelty is 
impossible in physical reductionism. Thus, the open-ended generation 
of novelty’ does not fit cleanly in the paradigmatic frameworks of either 
biology’ or physics’, and so must resort ultimately to randomness’. 
There have been several efforts to explore the gap between physics and 
evolution”””. This is because a growing state space over time requires 
the exploration of a large combinatorial set of possibilities", such as 
inthetheory of the adjacent possible”. However, the search generates 
an unsustainable expansion in the number of configurations possi- 
ble ina finite universe in finite time, and does not include selection. 
In addition, this approach has limited predictive power with respect 
to why only some evolutionary innovations happen and not others. 
Other efforts have studied the evolution of rules acting on other rules”; 
however, these models are abstract so it is difficult to see how they can 
describe—and predict—the evolution of physical objects. 


Here, we introduce AT, which addresses these challenges by describing 
how novelty generation and selection can operate in forward-evolving 
processes. The framework of AT allows us to predict features of new 
discoveries during selection, and to quantify how much selection was 
necessary to produce observed objects'** without having to prespecify 
individuals or units of selection. In AT, objects are not considered as 
point particles (as in most physics), but are defined by the histories of 
their formation as an intrinsic property, mapped as an assembly space. 
The assembly space is defined as the pathway by which a given object 
can be built from elementary building blocks, using only recursive 
operations. For the shortest path, the assembly space captures the 
minimal memory, interms of the minimal number of operations neces- 
sary to construct an observed object based on objects that could have 
existed in its past. One feature of biological assemblies of objects is 
multiple realizability wherein biological evolution can produce func- 
tionally equivalent classes of objects with modular use of units in many 
different contexts. For each unit, the minimal assembly is unique and 
independent of its formation, and therefore accounts for multiple 
realizability in how it could be constructed’”®. 

We introduce the foundations of AT and its implementation to 
quantify the degree of selection and evolution found ina collection of 
objects. Assembly is a function of two quantities: the number of copies 
of the observed objects and the objects’ assembly indices (an assembly 
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index is the number of steps on a minimal path producing the object). 
Assembly captures the amount of memory necessary to produce a 
selected configuration of historically contingent objects in a manner 
similar to how entropy quantifies the information (or lack thereof) 
necessary to specify the configuration of an ensemble of point particles, 
but assembly differs from entropy because of its explicit dependence 
onthe contingency in construction paths intrinsic to complex objects. 
We demonstrate how AT leads to a unified language for describing selec- 
tion and the generation of novelty, and thereby produce a framework 
to unify descriptions of selection across physics and biology. 


Assembly theory 


The concept of an object in AT is simple and rigorously defined. An 
object is finite, is distinguishable, persists over time and is breakable 
such that the set of constraints to construct it from elementary build- 
ing blocks is quantifiable. This definition is, in some sense, opposite 
to standard physics, which treats objects of interest as fundamental 
and unbreakable (for example, the concept of ‘atoms’ as indivisible, 
which now applies to elementary particles). In AT, we recognize that 
the smallest unit of matter is typically defined by the limits of obser- 
vational measurements and may not itself be fundamental. A more 
universal concept is to treat objects as anything that can be broken 
and built. This allows us to naturally account for the emergent objects 
produced by evolution and selection as fundamental to the theory. The 
concept of copy number is of foundational importance in defining a 
theory that accounts for selection. The more complex a given object, 
the less likely an identical copy can exist without selection of some 
information-driven mechanism that generates that object. An object 
that exists in multiple copies allows the signatures describing the set 
of constraints that built it to be measured experimentally. For example, 
mass spectrometry can be used to measure assembly for molecules, 
because it can measure how molecules are built by making bonds”. 


Assembly index and copy number 


To construct an assembly space for an object, one starts from elemen- 
tary building blocks comprising that object and recursively joins these 
to form new structures, whereby, at each recursive step, the objects 
formed are added back to the assembly pool and are available for sub- 
sequent steps (Supplementary Information Sections 1 and 2). AT cap- 
tures symmetry breaking arising along construction paths due to 
recursive use of past objects that can be combined in different ways to 
make new things. For any given object i, we can define its assembly 
space as all recursively assembled pathways that produce it. For each 
object, the most important feature is the assembly index a;, which 
corresponds to the shortest number of steps required to generate the 
object from basic building blocks. This can be quantified as the length 
of the shortest assembly pathway that can generate the object (Fig. 1). 

In chemical systems, molecular assembly theory treats bonds as 
the elementary operations from which molecules are constructed. 
The shortest path to build a given molecule can be found by breaking 
its bonds and then ordering its motifs in order of size, starting from 
atoms and moving to larger motifs by adding bonds in sequence. Given 
a motif generated on the path, the motif remains available for reuse. 
The recursivity allows identifying the shortest construction path with 
parts already built on that path, allowing us to quantify the minimum 
number of constraints, or memory size, to construct the molecule. The 
assembly index can be estimated from any complex discrete object with 
well-defined building blocks, which can be broken apart, as shown in 
Fig. 1. At every step, the size of the object increases by at least one. The 
number of total possible steps, although potentially large, is always 
finite for any finite object and thus the assembly index is computable 
in finite time. For molecules, the assembly index can be determined 
experimentally. 


322 | Nature | Vol 622 | 12 October 2023 


A hallmark feature of life is how complex objects are generated by 
evolution, of which many are functional. For example, a DNA molecule 
holds genetic information reliably and can be copied easily. By contrast, 
arandom string of letters requires much information to describe it, 
but is not normally seen as very complex or useful. Thus far, science 
has not been able to find a measure that quantifies the complexity of 
functionality to distinguish these two cases. Here we overcome this 
inherent problem by pointing out another feature of the evolutionary 
process: the complex and functional objects it generates take many 
steps to make, and selection allows many identical copies of these 
objects. Therefore, an evolutionary process can be identified by the 
production of many identical, or near-identical, multistep objects. 
The assembly index onits own cannot detect selection, but copy num- 
ber combined with the assembly index can. This approach defines a 
new way to measure complexity in terms of the hierarchy of causation 
stemming from selection at different levels. 

Because we do not typically know the full assembly trajectory of 
an object, we instead adopt a conservative alternative. AT finds the 
minimal number of steps to produce the object. We assume that every 
subobject, once available, can be used as often as needed to generate 
the object. A different approach would be to use Kolmogorov complex- 
ity?°” applied to a given molecule, but this requires starting witha 
graphical representation, and a program to compute the graph of that 
molecule. The Kolmogorov complexity of a string is the shortest pro- 
gram that will output that string for a programming language capable 
of universal computation. This measure cannot be easily computed, 
because checking whether any single program will output the string is 
uncomputable, as it involves, at least, deciding whether the program 
stops. Running this program reflects nothing of the underlying process 
of how the molecule was constructed. Only late in the evolutionary 
process will molecules be produced by anything starting to resemble 
Turing machines, loops, stacks, tapes and so on”. Thus, using universal 
computation to assess molecules adds unrealistic dynamics, making 
the answer uncomputable. The assembly measure that we have pre- 
sented here both uses realistic dynamics for molecules, using bonds as 
building blocks, and is computable for any molecule. The main work 
for detecting evolution and memory is done here by combining the 
assembly index and copy number of the objects. 

The aim of AT is to develop a new understanding of the evolution 
of complex matter that naturally accounts for selection and history 
in terms of what operations are physically possible in constructing 
an object‘. We will discuss AT as applied to chemical systems as the 
main application in this manuscript because their assembly index has 
been experimentally measured. For molecules, assembly index has a 
clear physical interpretation and has been validated as quantifying 
evidence of selection in its application to the detection of molecular 
signatures of life. However, we anticipate the theory to be sufficiently 
general to apply to a wide variety of other systems including poly- 
mers, cell morphology, graphs, images, computer programs, human 
languages and memes, as well as many others. The challenge in each 
case will be to construct an assembly space that has a clear physical 
meaning in terms of what operations can be caused to occur to make 
the object” (Fig. 1). 

In AT there are two important features of the context the object 
is found in. First, there must be objects in its environment that can 
constrain the steps to assemble the object and second these objects 
themselves have been selected because they must be retained over 
subsequent steps to physically instantiate the memory needed to build 
the target object. Among the most relatable examples are enzyme 
catalysts in biochemistry, which permit the formation of very unlikely 
molecules in large numbers because the enzymes themselves are also 
selected to exist with many copies. We make no distinction between 
the traditional notion of biological ‘individual’ and objects that are 
selected in the environment to quantify the selection necessary to 
produce a given configuration. Thus, our approach naturally accounts 
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Fig. 1| Assembly index and shortest path(s). a-c, AT is generalizable to 
different classes of objects, illustrated here for three different general types. 
a, Assembly pathway to construct diethyl phthalate molecule considering 
molecular bonds as the building blocks. The figure shows the pathway starting 
with the irreducible constructs to create the molecule with assembly index 8. 


for well-known phenomena, suchas niche construction, whereby organ- 
isms and environment are co-constructed and co-selected. 

Copy number is important because a single example of a highly 
complex molecule (with a very high assembly index) could poten- 
tially be generated in a series of random events that become increas- 
ingly less likely with increasing assembly index. If we consider a 
forward-building assembly process (see Supplementary Information 
Sections 1and 2 for details), without a specific target in mind, the num- 
ber of possible objects that could be built at each recursive step grows 
super-exponentially in the absence of any constraints. The likelihood 
of finding and measuring more than one copy of an object therefore 
decreases super-exponentially with increasing assembly index in the 
absence of selection for a specified target. Objects with high assembly 
index, found in abundance, provide evidence of selection because of 
the combinatorially growing space of possible objects at each recursive 
assembly step (Fig. 2). Finding more than one identical copy indicates 
the presence of anon-random process generating the object. 


oe eh 


b, Assembly pathway of a peptide chain by considering building blocks as 
strings. Left, four amino acids as building blocks. Middle, the actual object 
and its representation as a string. Right, assembly pathway to construct the 
string. c, Generalized assembly pathway of an object comprising discrete 
components. 


The assembly equation 


We define assembly as the total amount of selection necessary to pro- 
duce an ensemble of observed objects, quantified using equation (1): 


N E 
A=} (8) (1) 


where A is the assembly of the ensemble, a; is the assembly index of 
objecti, n;is its copy number, Nis the total number of unique objects, 
eis Euler’s number and N; is the total number of objects in the ensem- 
ble. Normalizing by the number of objects in the ensemble allows 
assembly to be compared between ensembles with different numbers 
of objects. 

Assembly quantifies two competing effects, the difficulty of discov- 
ering new objects, but, once discovered, some objects become easier 
to make; this is indicative of how selection was required to discover 
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Fig. 2 | Selection in assembly space. a, Pictorial representation of the 
assembly space representing the formation of combinatorial object space 
from building blocks and physical constraints. b, Observed copy number 
distributions of objects at different assembly indices as an outcome of selection 
ornoselection.c, Representation of physical pathways to construct objects 
with undirected and directed pathways (selected) leading to the lowand high 
copy numbers of the observed object. 


and make them. The exponential growth of assembly with depth in 
assembly space, as quantified by assembly index, is derived by consid- 
eringa linearly expanding assembly pool that has objects that combine 
at step a> a + 1, whereby an object at the assembly index a combines 
with another object from the assembly pool. Discovering new objects 
at increasing depth in an assembly space gets increasingly harder with 
depth because the space of possibilities expands exponentially. Once 
the pathway for a new object has been discovered, the production of 
an object (copy number greater than 1) gets easier as the copy number 
increases because a high copy number implies that an object can be 
produced readily in a given context. Thus, the hardest innovation is 
making an object for the first time, whichis equivalent to its discovery, 
followed by making the first copy of that object, but once an object 
exists in very high abundance it must already be relatively easy to make. 
Hence, assembly (A) scales linearly with copy number for more than 
one object fora fixed cost per object oncea process has been discovered 
(see Supplementary Information Section 3 for additional details). 
Increasing assembly (A) results from increasing copy numbers nand 
increasing assembly indices a. If high values of assembly can be shown 
to capture cases in which selection has occurred, it implies that finding 
high assembly index objects in high abundance is a signature of selec- 
tion. In AT, the information required at each step to construct the object 
is ‘stored’ within the object (Fig. 2). Each time two objects are combined 
from an assembly pool, the specificity of the combination process 
constitutes selection. As we will show, randomly combining objects 
within the assembly pool at each step does not constitute selection 
because no combinations exist in memory to be used again for building 
the same object. If, instead, certain combinations are preferentially 
used, it implies that a mechanism exists that selects the specific 
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operations and, by extension, specific target objects to be generated. 
Later we will quantify the degree of selectivity by parameter a in the 
growth dynamics, which allows parameterizing selection in an empir- 
ically observable manner by parameterizing reuses of specific sets of 
operations (see Supplementary Information Section 3 for example). 

Assembly as given in equation (1) is determined for identified finite 
and distinguishable objects (with copy number greater than 1) and 
their distinct assembly spaces. However, in real samples, there are 
almost always several different coexisting objects, which will include 
acommon history for their formation. Transistors, for example, are 
used across several different technologies, suggesting acommon sub- 
space in the assembly spaces of many modern technologies that 
includes transistor-like objects. This common subspace, constituting 
the overlap in the assembly paths of distinct structures, is called a 
co-assembly space. By contrast, ajoint assembly space of several objects 
is the combined assembly space required to generate those objects. 
AS a potential extension of the assembly equation, to account for the 
joint assembly of objects, we expand the formulation of the assembly 
equation that includes the quantification of shared pathways to con- 
struct objects to determine the assembly (A) of an ensemble with dif- 
ferent objects that share common history (Supplementary Information 
Section 3). 


Selection within assembly spaces 


The concept of the assembly space allows us to understand how selec- 
tion and historical contingency impose constraints on what can be 
made in the future. By aiming to detect ‘selection’, we mean a process 
similar to selection in Darwinian evolution. We do not, however, model 
functional differences that selection might act on. Instead, we account 
only for the specificity of selection—that some objects are more likely 
to be used to make new things and some are less likely. The only func- 
tionality we want to detect or describe is inthe memory of the process 
to generate the object, with examples including a metabolic reaction 
network or a genome. This allows the three Lewontin conditions for evo- 
lution to hold”. A key feature of assembly spaces is that they are com- 
binatorial, with objects combined at every step. Combinatorial spaces 
donot play a prominent role in current physics, because their objects 
are modelled as point particles and not as combinatorial objects (with 
limited exceptions). However, combinatorial objects are important in 
chemistry, biology and technology, in which most objects of interest 
(if not all) are hierarchical modular structures. More objects exist in 
assembly space than can be built in finite time with finite resources 
because the space of possibilities grows super-exponentially with the 
assembly index. To tame this explosive growth, in AT historical contin- 
gency is intrinsic with the space built compositionally, where items are 
combined recursively (accounting for hierarchical modularity) and 
this substantially constrains the number of possible objects. It is the 
combination of this compositionality with combinatorics that allows 
us to describe selection (Fig. 3). 

To produce an assembly space, an observed object is broken down 
recursively to generate a set of elementary building units. These units 
can be used to then recursively construct the assembly pathways of the 
original object(s) to build what we call assembly observed, Ao. Ao cap- 
tures all histories for the construction of the observed object(s) from 
elementary building blocks, consistent with what physical operations 
are possible. Because objects in AT are compositional, they contain 
information about the larger space of possible objects from which 
they were selected. To see how, we first build an assembly space from 
the same building blocks in Ao, which include all possible pathways for 
assembling any object composed of the same set elementary building 
blocks as our target object. The space so constructed is the assembly 
universe (A,). 

Inthe assembly universe, all objects are possible with no rules, yield- 
ing acombinatorial explosion and with double exponential growthin 
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Fig. 3 | Assembly spaces. a, Assembly observed of the three objects shown as 
graphs (P,, P, and P}) with their shared minimal construction process called 
their ‘joint assembly space’. b, Illustration of the expansion of the assembly 
universe, assembly possible, assembly contingent and assembly observed 


the number of objects, as is characteristic of exploding state spaces and 
the adjacent possible (see Supplementary Information Section 4 for 
details). Although mathematically well defined, this double exponential 
growth is unphysical because the physical processes place restric- 
tions on what is possible (in the case of molecules, an example is how 
quantum mechanics constrains the numbers of bonds per atom). The 
assembly universe also has no concept of directionality in time, as there 
is no ordering to construction processes. Because everything can exist, 
thereis animplication that objects can be constructed independently of 
what has existed in the past and of resource or time constraints, which 
is not what we observe in the real universe. For most systems of interest, 
including in molecular assembly spaces, the number of molecules in 
the assembly universe is orders of magnitude larger than the amount 
of matter available in the cosmologically observable universe. There is 
no way to computationally build and exhaust the entire space, even for 
objects with relatively low assembly indices. For larger objects, such as 
proteins, this canbe truly gigantic”. In AT, we do not observe all possible 
objects at a given depth in the assembly space because of selection, 
more reflective of what we see in the real universe. We next show how 
taking account of memory and resource limitation severely restricts the 
size of the space of what can be built, but also allows higher-assembly 
objects to be built before exhausting resources constructing all the 
possible lower-assembly objects. AT can account for selection precisely 
because of the historical contingency inthe recursive construction of 
objects along assembly paths. 

Assembly possible (A,) is the space of physically possible objects, 
which can be generated by means of the combinatorial expansion of 
all the known physical rules of object construction and allowing all 
rules to be available at every step to every object. This can be described 
by a dynamical model representing undirected forward dynamics in 
AT. When an object with assembly index a combines with its own his- 
tory, its assembly index increases by one,a > a + 1. Ifthe resulting object 
can be made by means of other, shorter path(s), its assembly index will 
be smaller thana + lor evena. Another assumption behind the dynam- 
ical model of undirected dynamics is a microscopically driven stochas- 
tic rule that uses existing objects uniformly: the probability of 
choosing an object with assembly index a to be combined with any 
other object is proportional to N,, the number of objects with assembly 
index a (see Supplementary Information Section 5 for further details). 

Within assembly possible, assembly contingent (Ac) describes the 
possible space of objects where history, and selection on that history, 
matter. Historical contingency is introduced by assuming that only 
the knowledge or constraints built on a given path can be used in the 
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(see text for details). Assembly universe has no dynamics and is displayed with 
assembly steps as the time axis. Note that the figure illustrates their nested 
structure only, not the relative size of the spaces where each set is typically 
exponentially larger than the subset. 


future, or with different paths interacting in cases in which selected 
objects that had not interacted previously now interact. We define the 
probability P, of an object being selected with assembly index (a) as 
P,~ (N,)*, where N, is the number of objects with assembly index a. 
Here, a parameterizes the degree of selection: for a= lall objects that 
have been assembled inthe past are available for reuse, and forO <a <1, 
only a subset (that grows non-linearly with assembly index) are avail- 
able for reuse, indicating that selection has occurred. This leads to the 
growth dynamics: 


Ng +1 


de Kala)" (2) 


where k, represents the rate of discovery (expansion rate) of new 
objects. For a=1, there is historical dependence without selection. We 
build assembly paths by taking two randomly chosen objects from the 
assembly pool and combining them; if a new object is formed, it is 
added back into the pool. Here we are building random objects, but 
these are fundamentally different from random combinatorial objects 
because the randomness we implement is distributed across the recur- 
sive construction steps leading to an object (see Supplementary Infor- 
mation Section 5 for solutions). The case of a =1, in which there is 
historical dependence but no selection, defines the boundary of assem- 
bly possible. 

Within assembly possible, the assembly contingent (A,) is the space 
of possible configurations of objects whereO < a < 1, thatis, where selec- 
tion is possible, and the objects found in the space are controlled bya 
path-dependency contingent on each object that has already been built. 
The growth of the assembly contingent is much slower than exponential; 
indeed, not all possible paths are explored equally. Instead, the dynam- 
ics are channelled by constraints imposed by the selectivity emerging 
along specific paths. Indeed, a signature of selection in assembly spaces 
is a slower-than-exponential growth of the number of unique objects. 
To show this, we use a simple phenomenological model of linear poly- 
mersto demonstrate how assembly differentiates cases when selection 
happens. Starting with a single monomer in the assembly pool, the 
undirected exploration process combines two randomly selected pol- 
ymers and adds them back to the assembly pool. Inthe case of directed 
exploration with selection, the polymer that has been created most 
recently is selected to join a randomly selected polymer from the assem- 
bly pool. For both directed and undirected exploration, this process 
was iterated up to 10* steps and repeated 25 times. For each observed 
polymer inthe assembly pool, the shortest pathway was generated. For 
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Fig. 4| Undirected and directed exploration ina forward assembly process. 
a, Thejoint assembly space of polymeric chains (with their lengths indicated) 
after 30 steps created by combining randomly selected polymers from the 
assembly pool. The length of the realized polymers is shown in blue (observed 
nodes), whereas nodes shown in black represent polymers that have not been 
realized but are part of the joint assembly space of all realized objects (contingent 
nodes). For simplicity of representing the joint assembly space, the edge nodes 
(shown in red) represent the combined node along the directed graph. b, The 
comparison between undirected and directed exploration after 100 assembly 
steps using a graph with radial embedding (observed and contingent nodes 
shown in red and grey, respectively). c, The mean and standard deviation of 

the exploration ratio (defined by the ratio of the number of observed nodes 
and the number of total nodes, which includes observed and contingent nodes) 
and mean maximum assembly index. nis 25 runs all averaged up to 10* 
assembly steps. 


each run, the assembly space of multiple coexisting polymers, their 
joint assembly space, was approximated by the union of the shortest 
pathways of all observed polymers. An example of joint assembly space 
in an undirected exploration up to 30 steps is shown in Fig. 4a. 
Comparison between the explored joint assembly space in undi- 
rected and directed exploration up to 100 steps is shown in Fig. 4b 
(see Supplementary Information Section 6 for details). To quantify the 
degree of exploration at a given assembly step, we calculated the 
exploration ratio, defined by the ratio of observed nodes to total num- 
ber of nodes present in the joint assembly space. Figure 4c shows the 
exploration ratio and the mean maximum assembly index observed, 
approximated by log, (n), where nis the length of the polymer for the 
undirected and directed exploration processes (both upper and lower 
bounds scale as log, (n) in leading order). Here, the mean maximum 
assembly index was estimated by calculating the assembly index of the 
mean value of the longest observed polymeric chains over 25 runs. 
Comparing the directed process to the undirected exploration illus- 
trates a central principle: the signal of selection is simply a lower explo- 
ration ratio and higher complexity (as defined by the maximum 
assembly index). The observation of a lower exploration ratio in the 
directed process than in the undirected process is the evidence of 
the presence of selectivity in the combination process between the 
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polymers existing in the assembly pool. The process representing sort- 
ing and selecting chains within the assembly pool represents an 
outcome ofa physical process leading to selection (see Supplementary 
Information Section 7 for an additional model). 

We conjecture that, the ‘more assembled’ an ensemble of objects, 
the more selection is required for it to come into existence. The his- 
torical contingency in AT means that assembly dynamics explores 
higher-assembly objects before exhausting all lower-assembly 
objects, leading to a vast separation in scales separating the number 
of objects that could have been explored versus those that are actually 
constructed following a particular path. For example, proteins built 
both from Dand Lamino acids and their pathways are part of assembly 
possible, but, within an assembly contingent trajectory, only proteins 
constructed out of Lamino acids might be present, because of early 
selection events. This early symmetry breaking along historically con- 
tingent paths is a fundamental property of all assembly processes. It 
introduces an ‘assembly time’ that ticks at each object being made: 
assembly physics includes an explicit arrow of time intrinsic to the 
structure of objects. 


Assembly unifies selection with physics 


In the real universe, objects can be built only from parts that already 
exist. The discovery of new objects is therefore historically contingent. 
The rate of discovery of new objects can be defined by the expansion 
rate (ka) from equation (2), introducing a characteristic timescale T, ~ re 
defined as the discovery time. In addition, once a pathway to build an 
object is discovered, the object can be reproduced if the mechanism 
in its environment is selected to build it again. Thus far, we have con- 
sidered discovery dynamics within the assembly spaces and did not 
account for the abundance or copy number of the observed objects 
when discovered. To include copy number in the dynamics of AT, we 
must introduce a second timescale, the rate of production (kp )ofa 
specific object, with a characteristic production timescale 7, AA 
(Fig. 5). For simplicity, we assume that selectivity and interaction among 
emerging objects are similar across assembled objects. Defining these 
two distinct timescales for initial discovery of an object and making 
copies of existing objects allows us to determine the regimes in which 
selection is possible (Fig. 5). 

For? > 1, whereby objects are discovered quickly but reproduced 
slowly, the expansion of assembly space is too fast under mass con- 
straints to accumulate a high abundance of any distinguishable objects, 
leading to a combinatorial explosion of unique objects with low copy 
numbers. This is consistent with how some unconstrained prebiotic 
synthesis reactions, such as the formose reaction, end up producing 
tar, which is composed of a large number of molecules with too low a 
copy number to be individually identifiable”’”®. Selection and evolution 
cannot emerge if new objects are generated on timescales so fast that 
resources are not hl for making more copies of those objects 
that already exist. For 2 <1, objects are reproduced quickly but new 
ones are discovered slowly. Here resources are primarily consumedin 
producing additional copies of objects that already exist. Typically, 
new objects are discovered infrequently. This leads toa high abundance 
of objects produced by extreme constraints, which could limit the 
further growth of assembly space. This illustrates how exploration 
versus exploitation can play out in AT. Significant separation of the two 
timescales of discovery of new objects and (re)production of selected 
objects results in either a combinatorial explosion of objects with low 
copy numbers or, conversely, high copy numbers of low assembly 
objects. In both cases, we will not observe trajectories that grow more 
complex structures. 

The emergence of selection and open-ended evolution in a physical 
system should occur in the transition regime where there is only a small 
separation in the timescales between discovering new objects and 
reproducing ones that are selected, for example the region located 
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Fig. 5| Selection and evolution in assembly space. a, Assembly processes 
with and without selection. The selection process is defined by a transition 
from undirected to directed exploration. The parameter a represents the 
selectivity of the assembly process (a= 1: undirected/random expansion, @<1: 
directed expansion). Undirected exploration leads to the fast homogeneous 
expansion of discovered objects in the assembly space, whereas directed 
exploration leads toa process that is more like a depth-first search. Here, tqis 
the characteristic timescale of discovery, determining the growth of the 
expansion front, and r, is the characteristic timescale of production that 
determines the rate of formation of objects (increasing copy number). b, Rate 
of discovery of unique objects at assembly a + 1 versus number of objects at 
assembly a. The transition of a=1toa<1represents the emergence of selectivity 
limiting the discovery of new objects. c, Phase space defined by the production 
(Tp) and discovery (T4) timescales. The figure shows three different regimes: 

(1) Ta < Tp, (2) T4> Tp, and (3) Ta Tp. Selection is unlikely to emerge in regimes1 
and 2, andis possible in regime 3. 


between T4 < ņ and 4> 7, (Fig. 5). To investigate discovery and pro- 
duction dynamics simultaneously, we introduce mass action kinetics 
inthe framework of AT. Our aim is to demonstrate how the generation 
of novelty can be described alongside selection in a forward process 
(thus unifying key features of life with physics) and how measuring 
assembly identifies how much selection occurred. We do so by study- 
ing phenomenological models, with the understanding that we are 
putting selection in by hand in our examples to demonstrate founda- 
tional principles of how assembly quantifies selection. To explore this, 
we consider a forward assembly process whereby the copy numbers 
of emerging objects follow homogeneous kinetics, together with the 
discovery dynamics as given by equation (2). With the discovery of new 
unique objects over time, symmetry breaking in the construction of 
contingent assembly paths will create a network of growing branches 
within the assembly possible. In principle, interactions among existing 
objects and external factors lead to discovery of new objects, expand- 
ing the space of possible future objects. Such events can drastically 


change the copy number distribution of objects at various assembly 
indices, depending on the emerging kinetics in the formation of new 
objects. By combining discovery and production kinetics ina simplified 
formulation, we estimate copy numbers of objects at different assem- 
bly indices and show assembly of the ensemble over time inthe forward 
process at different degrees of selection (see Supplementary Informa- 
tion Section 8 for an example). 

The interplay between the two characteristic timescales describes 
how discovery dynamics (tT, = 1/k,) and forward kinetics (Ca 1/k,), 
together with selection (characterized by the selection parameter a), 
are essential for driving processes towards creating higher-assembly 
objects. This is characteristic of trajectories within assembly contin- 
gent. Assembly captures key features of how the open-ended growth 
of complexity can occur within a restricted space only by generating 
new objects with increasing assembly indices, while also producing 
them with a high copy number. Selectivity (a < 1) together with com- 
parable production timescales (t4 = 1,) is essential for the production 
of high assembly ensembles. This suggests that selectivity in an 
unknown physical process can be explained by experimentally detect- 
ing the number of objects, their assembly index and copy number as 
a function of time. Considering molecules as objects and assuming 
that molecules observed using analytical techniques such as mass 
spectrometry implies a high copy number, the discovery rate and the 
selection index (a) can be computed from the temporal data of 
observed molecules at all assembly indices. 


Conclusions 

We have introduced the foundations of AT and how it can be imple- 
mented to quantify the degree of selection found in an ensemble of 
evolved objects, agnostic to the detailed formation mechanisms of 
the objects or knowing a priori which objects are products of units 
of selection. To do so, we introduced a quantity, assembly, built from 
two quantities: the number of copies of an object and its assembly 
index, where the assembly index is the minimal number of recursive 
steps necessary to build the object (its size). We demonstrated how AT 
allows a unified language for describing selection and the generation 
of novelty by showing how it quantifies the discovery and production 
of selected objects in a forward process described by mass action 
kinetics. AT provides a framework to unify descriptions of selection 
across physics and biology, with the potential to build a new physics 
that emerges in chemistry in which history and causal contingency 
through selection must start to play a prominent role in our descrip- 
tions of matter. For molecules, computing the assembly index is 
not explicitly necessary, because the assembly index can be probed 
directly experimentally with high accuracy with spectroscopy tech- 
niques including mass spectroscopy, infrared and nuclear magnetic 
resonance spectroscopy”. 
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Methods 


All the calculations were performed using Mathematica 13 (Wolfram 
Ltd). In addition, assembly index calculations on polymeric strings in 
the Supplementary Information were performed using a string assem- 
bly calculator previously developed using Python and C++. 


Reporting summary 
Further information on research design is available in the Nature 
Portfolio Reporting Summary linked to this article. 


Data availability 


All Mathematica Notebooks used to perform the calculations are 
available at https://github.com/croningp/assemblyphysics. The 
string assembly calculator and the dataset of assembly index calcula- 
tions is available from the Zenodo repository https://doi.org/10.5281/ 
zenodo.8017327. 
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